Synthetic Data

Synthetic Data
Author

김보람

Published

March 13, 2023

합성데이터 연구 동향

  1. 딥러닝 기술 발전 \(\to\) 합성 데이터 생성 기술, 생성적 적대 신경망(GAN) 등 이미지, 음성, 자연어 등의 합성 데이터 생성

  2. 유효성 검증 기술 \(\to\) 실제 데이터와 합성 데이터의 차이 최소화, 합성 데이터가 실제 데이터와 비슷하게 생성되었는지 검증

  3. 개인정보 보호 문제에 대한 대응책 \(\to\) 개인정보 마스킹 기술이나 개인정보 사용하지 않고도 유사한 합성데이터 생성 기술 연구

  4. 활용 분야 \(\to\) 컴퓨터 비전, 자연어 처리, 의료 분야, 로봇 분야 등

  5. 효과 입증 \(\to\) 모델 학습시 성능이 개선되는 경우가 있음

합성데이터 논문

  1. “딥러닝을 이용한 합성 데이터 생성 기술 연구 동향” (한국정보기술학회 논문지, 2019)
  • 딥러닝 기술을 이용한 합성 데이터 생성 기술의 연구 동향을 조사하고 분석한 내용
  1. “합성 데이터를 이용한 자율주행 차량 인식 모델의 성능 분석” (한국컴퓨터정보학회 논문지, 2019)
  • 합성 데이터를 이용하여 자율주행 차량 인식 모델을 학습시키고 성능을 분석한 내용
  1. “의료 영상 데이터를 위한 합성 데이터 생성 기술” (한국정보과학회 논문지, 2020)
  • 의료 영상 데이터를 위한 합성 데이터 생성 기술을 소개하고, 생성된 합성 데이터를 이용하여 의료 영상 인식 모델을 학습시키는 실험을 수행한 내용
  1. “합성 데이터 생성 기술을 이용한 동작 인식 성능 개선에 관한 연구” (한국지능시스템학회 논문지, 2020)
  • 합성 데이터 생성 기술을 이용하여 동작 인식 모델을 학습시키고 성능을 개선하는 실험을 수행한 내용
  1. “합성 데이터를 이용한 자연어 처리 분야에서의 문제 해결 방안 연구” (한국인터넷정보학회 논문지, 2021)
  • 합성 데이터를 이용하여 자연어 처리 분야에서의 문제를 해결하는 방안을 제시하고, 생성된 합성 데이터를 이용하여 자연어 처리 모델을 학습시키는 실험을 수행한 내용

위 관련 내용은 chat GPT가 소개해준 내용

합성 데이터 관련 논문이나 코드가 너무 없담… 어디서 찾지..

예시1: Faker 데이터 사용

!pip install Faker
Collecting Faker
  Downloading Faker-17.6.0-py3-none-any.whl (1.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 10.5 MB/s eta 0:00:0031m38.6 MB/s eta 0:00:01
Requirement already satisfied: python-dateutil>=2.4 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from Faker) (2.8.2)
Requirement already satisfied: six>=1.5 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from python-dateutil>=2.4->Faker) (1.16.0)
Installing collected packages: Faker
Successfully installed Faker-17.6.0

Faker: 가짜 데이터 생성하는 라이브러리

import numpy as np
import pandas as pd
from faker import Faker
fake = Faker()

# 가짜 이름, 주소, 전화번호, 이메일 생성하기
name = fake.name()
address = fake.address()
phone_number = fake.phone_number()
email = fake.email()
data = np.random.rand(100)
names = [fake.name() for i in range(100)]
addresses = [fake.address() for i in range(100)]
ages = [fake.random_int(min=18, max=80, step=1) for i in range(100)]
df = pd.DataFrame({'Name': names, 'Address': addresses, 'Age': ages})
df # 100개의 가짜 데이터 생성
Name Address Age
0 Jason Williams 88899 Miller Fall Apt. 222\nNew Eric, VA 87882 36
1 Brett Ramos 81526 Jacqueline Corners Suite 818\nJessicaton... 73
2 Mario Mitchell 54833 Cox Lake Suite 142\nChristianville, PW 0... 22
3 David Ryan 80999 Melissa Club\nNorth Curtis, MI 28118 32
4 Marcus Adkins 6770 Jessica Radial\nFloresberg, AR 35810 58
... ... ... ...
95 Francisco Porter 84916 Brown Mission\nWest Williamborough, MI 3... 43
96 Alexander Martinez 3583 David View\nPatrickfurt, IA 27730 50
97 Joshua Torres 4304 Macdonald Lake Suite 363\nLake Melissashi... 24
98 Lauren Morris 584 Walker Squares Suite 817\nSharonshire, MO ... 48
99 Jay Benson 64335 Smith Rest Suite 370\nNorth Robertland, ... 54

100 rows × 3 columns

예시2: GAN 사용

!pip install tensorflow
import tensorflow as tf
from tensorflow.keras import layers
Collecting tensorflow
  Downloading tensorflow-2.11.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (588.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 588.3/588.3 MB 8.2 MB/s eta 0:00:00m eta 0:00:01[36m0:00:01
Collecting grpcio<2.0,>=1.24.3
  Downloading grpcio-1.51.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.8/4.8 MB 86.3 MB/s eta 0:00:0031m106.4 MB/s eta 0:00:01
Requirement already satisfied: six>=1.12.0 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from tensorflow) (1.16.0)
Collecting opt-einsum>=2.3.2
  Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.5/65.5 kB 16.0 MB/s eta 0:00:00
Collecting astunparse>=1.6.0
  Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting google-pasta>=0.1.1
  Downloading google_pasta-0.2.0-py3-none-any.whl (57 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.5/57.5 kB 13.7 MB/s eta 0:00:00
Collecting tensorboard<2.12,>=2.11
  Downloading tensorboard-2.11.2-py3-none-any.whl (6.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.0/6.0 MB 93.8 MB/s eta 0:00:0031m124.4 MB/s eta 0:00:01
Requirement already satisfied: numpy>=1.20 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from tensorflow) (1.24.2)
Collecting protobuf<3.20,>=3.9.2
  Downloading protobuf-3.19.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 95.7 MB/s eta 0:00:00
Collecting termcolor>=1.1.0
  Downloading termcolor-2.2.0-py3-none-any.whl (6.6 kB)
Collecting tensorflow-io-gcs-filesystem>=0.23.1
  Downloading tensorflow_io_gcs_filesystem-0.31.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.4/2.4 MB 97.2 MB/s eta 0:00:00
Collecting keras<2.12,>=2.11.0
  Downloading keras-2.11.0-py2.py3-none-any.whl (1.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 100.2 MB/s eta 0:00:00
Collecting gast<=0.4.0,>=0.2.1
  Downloading gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting h5py>=2.9.0
  Downloading h5py-3.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.7/4.7 MB 98.9 MB/s eta 0:00:002 MB/s eta 0:00:01
Collecting flatbuffers>=2.0
  Downloading flatbuffers-23.3.3-py2.py3-none-any.whl (26 kB)
Requirement already satisfied: setuptools in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from tensorflow) (65.6.3)
Collecting wrapt>=1.11.0
  Downloading wrapt-1.15.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (81 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 81.5/81.5 kB 24.3 MB/s eta 0:00:00
Collecting tensorflow-estimator<2.12,>=2.11.0
  Downloading tensorflow_estimator-2.11.0-py2.py3-none-any.whl (439 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 439.2/439.2 kB 76.0 MB/s eta 0:00:00
Collecting libclang>=13.0.0
  Downloading libclang-15.0.6.1-py2.py3-none-manylinux2010_x86_64.whl (21.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.5/21.5 MB 73.7 MB/s eta 0:00:00m eta 0:00:010:01:01
Collecting absl-py>=1.0.0
  Downloading absl_py-1.4.0-py3-none-any.whl (126 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 126.5/126.5 kB 30.3 MB/s eta 0:00:00
Requirement already satisfied: packaging in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from tensorflow) (23.0)
Requirement already satisfied: typing-extensions>=3.6.6 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from tensorflow) (4.4.0)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from astunparse>=1.6.0->tensorflow) (0.38.4)
Collecting tensorboard-data-server<0.7.0,>=0.6.0
  Downloading tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 100.6 MB/s eta 0:00:001m123.6 MB/s eta 0:00:01
Collecting google-auth-oauthlib<0.5,>=0.4.1
  Downloading google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB)
Collecting google-auth<3,>=1.6.3
  Downloading google_auth-2.16.2-py2.py3-none-any.whl (177 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.2/177.2 kB 41.2 MB/s eta 0:00:00
Collecting markdown>=2.6.8
  Downloading Markdown-3.4.1-py3-none-any.whl (93 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 93.3/93.3 kB 22.9 MB/s eta 0:00:00
Collecting tensorboard-plugin-wit>=1.6.0
  Downloading tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 781.3/781.3 kB 83.9 MB/s eta 0:00:00
Requirement already satisfied: requests<3,>=2.21.0 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from tensorboard<2.12,>=2.11->tensorflow) (2.28.2)
Requirement already satisfied: werkzeug>=1.0.1 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from tensorboard<2.12,>=2.11->tensorflow) (2.2.3)
Collecting cachetools<6.0,>=2.0.0
  Downloading cachetools-5.3.0-py3-none-any.whl (9.3 kB)
Collecting pyasn1-modules>=0.2.1
  Downloading pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.3/155.3 kB 38.6 MB/s eta 0:00:00
Collecting rsa<5,>=3.1.4
  Downloading rsa-4.9-py3-none-any.whl (34 kB)
Collecting requests-oauthlib>=0.7.0
  Downloading requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Requirement already satisfied: importlib-metadata>=4.4 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from markdown>=2.6.8->tensorboard<2.12,>=2.11->tensorflow) (6.0.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard<2.12,>=2.11->tensorflow) (2022.12.7)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard<2.12,>=2.11->tensorflow) (2.1.1)
Requirement already satisfied: idna<4,>=2.5 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard<2.12,>=2.11->tensorflow) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard<2.12,>=2.11->tensorflow) (1.26.14)
Requirement already satisfied: MarkupSafe>=2.1.1 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from werkzeug>=1.0.1->tensorboard<2.12,>=2.11->tensorflow) (2.1.1)
Requirement already satisfied: zipp>=0.5 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.12,>=2.11->tensorflow) (3.15.0)
Collecting pyasn1<0.5.0,>=0.4.6
  Downloading pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.1/77.1 kB 18.9 MB/s eta 0:00:00
Collecting oauthlib>=3.0.0
  Downloading oauthlib-3.2.2-py3-none-any.whl (151 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 151.7/151.7 kB 37.3 MB/s eta 0:00:00
Installing collected packages: tensorboard-plugin-wit, pyasn1, libclang, flatbuffers, wrapt, termcolor, tensorflow-io-gcs-filesystem, tensorflow-estimator, tensorboard-data-server, rsa, pyasn1-modules, protobuf, opt-einsum, oauthlib, keras, h5py, grpcio, google-pasta, gast, cachetools, astunparse, absl-py, requests-oauthlib, markdown, google-auth, google-auth-oauthlib, tensorboard, tensorflow
Successfully installed absl-py-1.4.0 astunparse-1.6.3 cachetools-5.3.0 flatbuffers-23.3.3 gast-0.4.0 google-auth-2.16.2 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.51.3 h5py-3.8.0 keras-2.11.0 libclang-15.0.6.1 markdown-3.4.1 oauthlib-3.2.2 opt-einsum-3.3.0 protobuf-3.19.6 pyasn1-0.4.8 pyasn1-modules-0.2.8 requests-oauthlib-1.3.1 rsa-4.9 tensorboard-2.11.2 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.11.0 tensorflow-estimator-2.11.0 tensorflow-io-gcs-filesystem-0.31.0 termcolor-2.2.0 wrapt-1.15.0
# Define the generator model
def make_generator_model():
    model = tf.keras.Sequential()
    model.add(layers.Dense(256, input_shape=(100,), use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Dense(512, use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Dense(28*28*1, use_bias=False, activation='tanh'))
    model.add(layers.Reshape((28, 28, 1)))

    return model
make_generator_model()

- 생성기 모델 정의

  • 100차원 벡터를 입력으로 사용하고 데이터 세트의 입력 이미지와 동일한 크기의 출력 이미지 생성

  • batch normalization \(\to\) leaky ReLU activation function \(\to\) dense layer with a hyperbolic tangent activation function

# Define the discriminator model
def make_discriminator_model():
    model = tf.keras.Sequential()
    model.add(layers.Flatten(input_shape=(28, 28, 1)))
    model.add(layers.Dense(512, use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
    model.add(layers.Dense(256, use_bias=False))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())
    model.add(layers.Dense(1))

    return model
make_discriminator_model()

- 판별기 모델

  • 이미지를 입력으로 받아 실제여부 구별하여 스칼라 값 출력(마지막 레이어가)

  • batch normalization, leaky ReLU activation function,final dense layer with no activation function

# Define the GAN model
def make_gan_model(generator, discriminator):
    discriminator.trainable = False

    model = tf.keras.Sequential()
    model.add(generator)
    model.add(discriminator)

    return model
make_gan_model( , )

- GAN 모델

  • 생성기와 판별기 모델 단일 모델로 결합

  • generator: discriminator가 실제로 분류하는 이미지 생성하도록 훈련

  • disciminator: generator가 생성하는 실제 이미지와 가짜 이미지 정확하게 분류하도록 훈련

# Define the loss functions and optimizers
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)

def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    total_loss = real_loss + fake_loss
    return total_loss

def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)

generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4)
  • GAN 훈련시키려면 loss functionsoptimizers 정의

  • binary cross-entropy loss 사용

  • Adam:0.0001

# Train the GAN model
def train_gan(gan_model, dataset, epochs, batch_size):
    generator, discriminator = gan_model.layers

    for epoch in range(epochs):
        for batch in dataset:
            noise = tf.random.normal([batch_size, 100])

            with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
                generated_images = generator(noise, training=True)

                real_output = discriminator(batch, training=True)
                fake_output = discriminator(generated_images, training=True)

                gen_loss = generator_loss(fake_output)
                disc_loss = discriminator_loss(real_output, fake_output)

            gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
            gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

            generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
            discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))

        print("Epoch {} complete".format(epoch))
# Generate synthetic images using the trained generator
def generate_images(generator, num_images):
    noise = tf.random.normal([num_images, 100])
    generated_images = generator(noise, training=False)
    generated_images = (generated_images + 1) / 2.0  # scale images to [0, 1]
    return generated_images.numpy()
  • 훈련된 generator로 synthetic imges
# Load and preprocess real images for training
(x_train, _), (_, _) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(x_train.shape[0], 28,28,1)

예시3: VAE사용 (개념부터 알아야할듯..)

import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras import layers
# load and preprocess the real images for training
(train_images, _), (_, _) = keras.datasets.mnist.load_data()
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
train_images = (train_images - 127.5) / 127.5  # Normalize the images to [-1, 1]
# define the VAE model
latent_dim = 2
latent_dim = 2

encoder_inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(32, 3, activation="relu", strides=2, padding="same")(encoder_inputs)
x = layers.Conv2D(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Flatten()(x)
x = layers.Dense(16, activation="relu")(x)
z_mean = layers.Dense(latent_dim, name="z_mean")(x)
z_log_var = layers.Dense(latent_dim, name="z_log_var")(x)
encoder = keras.Model(encoder_inputs, [z_mean, z_log_var], name="encoder")

latent_inputs = keras.Input(shape=(latent_dim,))
x = layers.Dense(7 * 7 * 64, activation="relu")(latent_inputs)
x = layers.Reshape((7, 7, 64))(x)
x = layers.Conv2DTranspose(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Conv2DTranspose(32, 3, activation="relu", strides=2, padding="same")(x)
decoder_outputs = layers.Conv2DTranspose(1, 3, activation="sigmoid", padding="same")(x)
decoder = keras.Model(latent_inputs, decoder_outputs, name="decoder")

class Sampling(layers.Layer):
    """Uses (z_mean, z_log_var) to sample z, the vector encoding a digit."""

    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = keras.backend.shape(z_mean)[0]
        dim = keras.backend.int_shape(z_mean)[1]
        epsilon = keras.backend.random_normal(shape=(batch, dim))
        return z_mean + keras.backend.exp(0.5 * z_log_var) * epsilon

z = Sampling()([z_mean, z_log_var])
outputs = decoder(z)
vae = keras.Model(encoder_inputs, outputs, name="vae")

# define the loss function
reconstruction_loss = keras.losses.binary_crossentropy(encoder_inputs, outputs)
reconstruction_loss *= 28 * 28
kl_loss = 1 + z_log_var - keras.backend.square(z_mean) - keras.backend.exp(z_log_var)
kl_loss = keras.backend.sum(kl_loss, axis=-1)
kl_loss *= -0.5
vae_loss = keras.backend.mean(reconstruction_loss + kl_loss)
vae.add_loss(vae_loss)

# compile the model
vae.compile(optimizer='adam')

# train the model
vae.fit(train_images, epochs=10, batch_size=128)
Epoch 1/10
InvalidArgumentError: Graph execution error:

Detected at node 'gradient_tape/vae/tf.__operators__.add_1/BroadcastGradientArgs' defined at (most recent call last):
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/runpy.py", line 194, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/runpy.py", line 87, in _run_code
      exec(code, run_globals)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/ipykernel_launcher.py", line 16, in <module>
      app.launch_new_instance()
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/traitlets/config/application.py", line 1043, in launch_instance
      app.start()
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 619, in start
      self.io_loop.start()
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 199, in start
      self.asyncio_loop.run_forever()
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
      self._run_once()
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
      handle._run()
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/asyncio/events.py", line 81, in _run
      self._context.run(self._callback, *self._args)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/tornado/ioloop.py", line 688, in <lambda>
      lambda f: self._run_callback(functools.partial(callback, future))
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/tornado/ioloop.py", line 741, in _run_callback
      ret = callback()
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/tornado/gen.py", line 814, in inner
      self.ctx_run(self.run)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/tornado/gen.py", line 775, in run
      yielded = self.gen.send(value)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 358, in process_one
      yield gen.maybe_future(dispatch(*args))
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/tornado/gen.py", line 234, in wrapper
      yielded = ctx_run(next, result)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 261, in dispatch_shell
      yield gen.maybe_future(handler(stream, idents, msg))
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/tornado/gen.py", line 234, in wrapper
      yielded = ctx_run(next, result)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 536, in execute_request
      self.do_execute(
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/tornado/gen.py", line 234, in wrapper
      yielded = ctx_run(next, result)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/ipykernel/ipkernel.py", line 302, in do_execute
      res = shell.run_cell(code, store_history=store_history, silent=silent)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/ipykernel/zmqshell.py", line 539, in run_cell
      return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2961, in run_cell
      result = self._run_cell(
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3016, in _run_cell
      result = runner(coro)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
      coro.send(None)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3221, in run_cell_async
      has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3400, in run_ast_nodes
      if await self.run_code(code, result, async_=asy):
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3460, in run_code
      exec(code_obj, self.user_global_ns, self.user_ns)
    File "<ipython-input-79-41cbac8b1cbb>", line 2, in <module>
      vae.fit(train_images, epochs=10, batch_size=128)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/keras/engine/training.py", line 1027, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 526, in minimize
      grads_and_vars = self.compute_gradients(loss, var_list, tape)
    File "/home/coco/anaconda3/envs/py38/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 259, in compute_gradients
      grads = tape.gradient(loss, var_list)
Node: 'gradient_tape/vae/tf.__operators__.add_1/BroadcastGradientArgs'
Incompatible shapes: [128,28,28] vs. [128]
     [[{{node gradient_tape/vae/tf.__operators__.add_1/BroadcastGradientArgs}}]] [Op:__inference_train_function_3053]
# generate synthetic data
n_samples = 10
random_latent_vectors = np.random.normal(size=(n_samples, latent_dim))
generated_images = decoder.predict(random_latent_vectors)
1/1 [==============================] - 0s 55ms/step
# display the generated images
for i in range(n_samples):
    plt.imshow(generated_images[i].reshape(28, 28))
    plt.show()

예시4: a simple feedforward neural network

!pip install scikit-learn
Collecting scikit-learn
  Downloading scikit_learn-1.2.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.8/9.8 MB 80.6 MB/s eta 0:00:00m eta 0:00:010:01
Requirement already satisfied: scipy>=1.3.2 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from scikit-learn) (1.10.1)
Collecting joblib>=1.1.1
  Downloading joblib-1.2.0-py3-none-any.whl (297 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 298.0/298.0 kB 58.5 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.17.3 in /home/coco/anaconda3/envs/py38/lib/python3.8/site-packages (from scikit-learn) (1.24.2)
Collecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Installing collected packages: threadpoolctl, joblib, scikit-learn
Successfully installed joblib-1.2.0 scikit-learn-1.2.2 threadpoolctl-3.1.0
# import necessary libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from tensorflow import keras
from tensorflow.keras import layers
# generate and preprocess the real data for training
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)
scaler = StandardScaler()
X = scaler.fit_transform(X)
# define the feedforward neural network
model = keras.Sequential([
    layers.Dense(16, activation='relu', input_shape=(X.shape[1],)),
    layers.Dense(8, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy')
# train the model
model.fit(X, y, epochs=100, batch_size=32)
Epoch 1/100
32/32 [==============================] - 0s 553us/step - loss: 0.5901
Epoch 2/100
32/32 [==============================] - 0s 472us/step - loss: 0.5224
Epoch 3/100
32/32 [==============================] - 0s 477us/step - loss: 0.4699
Epoch 4/100
32/32 [==============================] - 0s 472us/step - loss: 0.4287
Epoch 5/100
32/32 [==============================] - 0s 459us/step - loss: 0.3960
Epoch 6/100
32/32 [==============================] - 0s 461us/step - loss: 0.3715
Epoch 7/100
32/32 [==============================] - 0s 466us/step - loss: 0.3530
Epoch 8/100
32/32 [==============================] - 0s 466us/step - loss: 0.3412
Epoch 9/100
32/32 [==============================] - 0s 464us/step - loss: 0.3338
Epoch 10/100
32/32 [==============================] - 0s 470us/step - loss: 0.3267
Epoch 11/100
32/32 [==============================] - 0s 470us/step - loss: 0.3219
Epoch 12/100
32/32 [==============================] - 0s 471us/step - loss: 0.3182
Epoch 13/100
32/32 [==============================] - 0s 471us/step - loss: 0.3158
Epoch 14/100
32/32 [==============================] - 0s 461us/step - loss: 0.3131
Epoch 15/100
32/32 [==============================] - 0s 459us/step - loss: 0.3110
Epoch 16/100
32/32 [==============================] - 0s 456us/step - loss: 0.3095
Epoch 17/100
32/32 [==============================] - 0s 458us/step - loss: 0.3075
Epoch 18/100
32/32 [==============================] - 0s 454us/step - loss: 0.3059
Epoch 19/100
32/32 [==============================] - 0s 448us/step - loss: 0.3042
Epoch 20/100
32/32 [==============================] - 0s 459us/step - loss: 0.3028
Epoch 21/100
32/32 [==============================] - 0s 458us/step - loss: 0.3017
Epoch 22/100
32/32 [==============================] - 0s 457us/step - loss: 0.3018
Epoch 23/100
32/32 [==============================] - 0s 462us/step - loss: 0.3001
Epoch 24/100
32/32 [==============================] - 0s 456us/step - loss: 0.2992
Epoch 25/100
32/32 [==============================] - 0s 457us/step - loss: 0.2979
Epoch 26/100
32/32 [==============================] - 0s 467us/step - loss: 0.2966
Epoch 27/100
32/32 [==============================] - 0s 465us/step - loss: 0.2960
Epoch 28/100
32/32 [==============================] - 0s 467us/step - loss: 0.2950
Epoch 29/100
32/32 [==============================] - 0s 467us/step - loss: 0.2938
Epoch 30/100
32/32 [==============================] - 0s 467us/step - loss: 0.2929
Epoch 31/100
32/32 [==============================] - 0s 469us/step - loss: 0.2925
Epoch 32/100
32/32 [==============================] - 0s 473us/step - loss: 0.2915
Epoch 33/100
32/32 [==============================] - 0s 469us/step - loss: 0.2906
Epoch 34/100
32/32 [==============================] - 0s 472us/step - loss: 0.2896
Epoch 35/100
32/32 [==============================] - 0s 473us/step - loss: 0.2891
Epoch 36/100
32/32 [==============================] - 0s 461us/step - loss: 0.2874
Epoch 37/100
32/32 [==============================] - 0s 456us/step - loss: 0.2871
Epoch 38/100
32/32 [==============================] - 0s 455us/step - loss: 0.2865
Epoch 39/100
32/32 [==============================] - 0s 457us/step - loss: 0.2855
Epoch 40/100
32/32 [==============================] - 0s 457us/step - loss: 0.2848
Epoch 41/100
32/32 [==============================] - 0s 459us/step - loss: 0.2843
Epoch 42/100
32/32 [==============================] - 0s 454us/step - loss: 0.2835
Epoch 43/100
32/32 [==============================] - 0s 457us/step - loss: 0.2827
Epoch 44/100
32/32 [==============================] - 0s 444us/step - loss: 0.2818
Epoch 45/100
32/32 [==============================] - 0s 447us/step - loss: 0.2813
Epoch 46/100
32/32 [==============================] - 0s 457us/step - loss: 0.2801
Epoch 47/100
32/32 [==============================] - 0s 460us/step - loss: 0.2795
Epoch 48/100
32/32 [==============================] - 0s 458us/step - loss: 0.2789
Epoch 49/100
32/32 [==============================] - 0s 462us/step - loss: 0.2781
Epoch 50/100
32/32 [==============================] - 0s 459us/step - loss: 0.2775
Epoch 51/100
32/32 [==============================] - 0s 459us/step - loss: 0.2766
Epoch 52/100
32/32 [==============================] - 0s 461us/step - loss: 0.2762
Epoch 53/100
32/32 [==============================] - 0s 460us/step - loss: 0.2757
Epoch 54/100
32/32 [==============================] - 0s 462us/step - loss: 0.2750
Epoch 55/100
32/32 [==============================] - 0s 457us/step - loss: 0.2729
Epoch 56/100
32/32 [==============================] - 0s 462us/step - loss: 0.2722
Epoch 57/100
32/32 [==============================] - 0s 465us/step - loss: 0.2711
Epoch 58/100
32/32 [==============================] - 0s 447us/step - loss: 0.2708
Epoch 59/100
32/32 [==============================] - 0s 451us/step - loss: 0.2704
Epoch 60/100
32/32 [==============================] - 0s 449us/step - loss: 0.2687
Epoch 61/100
32/32 [==============================] - 0s 454us/step - loss: 0.2683
Epoch 62/100
32/32 [==============================] - 0s 464us/step - loss: 0.2673
Epoch 63/100
32/32 [==============================] - 0s 469us/step - loss: 0.2663
Epoch 64/100
32/32 [==============================] - 0s 473us/step - loss: 0.2658
Epoch 65/100
32/32 [==============================] - 0s 466us/step - loss: 0.2651
Epoch 66/100
32/32 [==============================] - 0s 467us/step - loss: 0.2641
Epoch 67/100
32/32 [==============================] - 0s 467us/step - loss: 0.2633
Epoch 68/100
32/32 [==============================] - 0s 467us/step - loss: 0.2622
Epoch 69/100
32/32 [==============================] - 0s 463us/step - loss: 0.2618
Epoch 70/100
32/32 [==============================] - 0s 463us/step - loss: 0.2611
Epoch 71/100
32/32 [==============================] - 0s 464us/step - loss: 0.2598
Epoch 72/100
32/32 [==============================] - 0s 466us/step - loss: 0.2596
Epoch 73/100
32/32 [==============================] - 0s 460us/step - loss: 0.2582
Epoch 74/100
32/32 [==============================] - 0s 448us/step - loss: 0.2583
Epoch 75/100
32/32 [==============================] - 0s 452us/step - loss: 0.2573
Epoch 76/100
32/32 [==============================] - 0s 466us/step - loss: 0.2558
Epoch 77/100
32/32 [==============================] - 0s 462us/step - loss: 0.2556
Epoch 78/100
32/32 [==============================] - 0s 462us/step - loss: 0.2543
Epoch 79/100
32/32 [==============================] - 0s 451us/step - loss: 0.2547
Epoch 80/100
32/32 [==============================] - 0s 448us/step - loss: 0.2527
Epoch 81/100
32/32 [==============================] - 0s 453us/step - loss: 0.2516
Epoch 82/100
32/32 [==============================] - 0s 450us/step - loss: 0.2505
Epoch 83/100
32/32 [==============================] - 0s 451us/step - loss: 0.2502
Epoch 84/100
32/32 [==============================] - 0s 453us/step - loss: 0.2485
Epoch 85/100
32/32 [==============================] - 0s 452us/step - loss: 0.2475
Epoch 86/100
32/32 [==============================] - 0s 448us/step - loss: 0.2468
Epoch 87/100
32/32 [==============================] - 0s 453us/step - loss: 0.2461
Epoch 88/100
32/32 [==============================] - 0s 451us/step - loss: 0.2448
Epoch 89/100
32/32 [==============================] - 0s 450us/step - loss: 0.2438
Epoch 90/100
32/32 [==============================] - 0s 447us/step - loss: 0.2431
Epoch 91/100
32/32 [==============================] - 0s 456us/step - loss: 0.2421
Epoch 92/100
32/32 [==============================] - 0s 452us/step - loss: 0.2413
Epoch 93/100
32/32 [==============================] - 0s 452us/step - loss: 0.2403
Epoch 94/100
32/32 [==============================] - 0s 454us/step - loss: 0.2395
Epoch 95/100
32/32 [==============================] - 0s 460us/step - loss: 0.2388
Epoch 96/100
32/32 [==============================] - 0s 457us/step - loss: 0.2381
Epoch 97/100
32/32 [==============================] - 0s 460us/step - loss: 0.2368
Epoch 98/100
32/32 [==============================] - 0s 459us/step - loss: 0.2362
Epoch 99/100
32/32 [==============================] - 0s 460us/step - loss: 0.2349
Epoch 100/100
32/32 [==============================] - 0s 459us/step - loss: 0.2347
<keras.callbacks.History at 0x7f71e8106490>
# generate synthetic data
n_samples = 10
random_vectors = np.random.normal(size=(n_samples, X.shape[1]))
generated_data = model.predict(random_vectors)
1/1 [==============================] - 0s 22ms/step
# display the generated data
print(generated_data)
[[0.00134968]
 [0.11726338]
 [0.6508618 ]
 [0.9898428 ]
 [0.9719319 ]
 [0.9829191 ]
 [0.6400001 ]
 [0.00335612]
 [0.15759172]
 [0.00645017]]

예시5: KDE(Kernel Density Estimation)

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.neighbors import KernelDensity
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)
kde = KernelDensity(kernel='gaussian', bandwidth=0.5)
kde.fit(X)
KernelDensity(bandwidth=0.5)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
n_samples = 10
synthetic_data = kde.sample(n_samples)
synthetic_data
array([[-6.40543316e-01, -2.00816029e-01, -1.38112318e+00,
        -5.04647026e-01, -1.16412835e-01,  2.31176462e-01,
         2.67275989e+00, -2.43788602e+00, -1.07609433e+00,
        -7.60376957e-01],
       [-1.96154361e+00, -5.44035464e-01, -1.29299914e+00,
        -1.79243149e+00, -7.91254247e-01, -1.49100460e+00,
         2.59878119e+00, -7.00449777e-01,  2.46705029e+00,
        -8.13271665e-01],
       [ 4.92102583e-01,  4.61885958e-01,  9.06165151e-01,
         7.09125038e-01,  3.62018015e-01,  1.33662227e+00,
        -1.73166578e+00, -1.54776550e+00, -6.47684988e-01,
         1.50384591e+00],
       [-1.37270120e+00,  1.11711271e+00,  1.69950848e-01,
         1.71370541e+00, -9.95787406e-01,  9.37075645e-01,
         6.86130469e-01,  2.68568255e+00,  5.23540728e-01,
         1.28789991e+00],
       [-6.09739471e-01,  2.74003692e-01, -4.99186042e-01,
         7.55177619e-01,  1.10635851e+00, -6.33249957e-01,
         1.45340497e+00,  1.50254494e-01,  1.70527839e+00,
        -1.68688562e+00],
       [ 2.82712325e-01,  6.99937229e-01,  2.35281704e-01,
        -1.38757829e+00, -3.07830479e-01, -1.66666566e+00,
         3.49788502e-01,  1.81251770e+00,  1.41807360e+00,
        -1.14791545e-01],
       [ 6.89280678e-01, -2.94374178e-03,  1.00903417e+00,
         1.42917675e+00,  9.38636180e-01,  6.54756154e-01,
        -1.50349162e+00, -1.72798315e-01,  7.09793089e-01,
         2.34657027e+00],
       [ 1.40766106e+00,  1.33457643e+00,  6.06365408e-02,
        -1.61743252e+00,  1.29343257e+00, -4.73977756e-01,
         2.15440056e-01, -1.41240310e+00, -3.73397331e+00,
         1.15540476e+00],
       [-4.13886212e-02,  1.31264769e+00,  2.42505327e-01,
        -9.10608344e-01,  1.92476492e+00,  5.73378858e-01,
        -9.79660592e-01,  1.66989518e+00, -3.47583358e-02,
         2.49848587e-01],
       [ 1.73735646e-01, -8.64171575e-01, -1.98493560e-01,
         1.64845898e-01, -8.99243472e-01, -1.51645063e-01,
        -8.09655616e-01, -7.19465453e-01, -9.90817553e-01,
        -1.45541083e+00]])
fig, ax = plt.subplots()
ax.scatter(X[:, 0], X[:, 1], c=y)
ax.scatter(synthetic_data[:, 0], synthetic_data[:, 1], c='r')
plt.show()

예시6: random sampling technique

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)
n_samples = 10
synthetic_data = np.random.rand(n_samples, X.shape[1])
fig, ax = plt.subplots()
ax.scatter(X[:, 0], X[:, 1], c=y)
ax.scatter(synthetic_data[:, 0], synthetic_data[:, 1], c='r')
plt.show()

예시7: linear regression model

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)
lr = LinearRegression()
lr.fit(X, y)
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
n_samples = 10
synthetic_data = lr.predict(np.random.rand(n_samples, 1))
fig, ax = plt.subplots()
ax.scatter(X, y)
ax.scatter(np.random.rand(n_samples, 1), synthetic_data, c='r')
plt.show()